Aero-Engine Remaining Useful Life Prediction Based on Bi-Discrepancy Network

Most unsupervised domain adaptation (UDA) methods align feature distributions across different domains through adversarial learning. However, many of them require introducing an auxiliary domain alignment model, which incurs additional computational costs. In addition, they generally focus on the global distribution alignment and ignore the fine-grained domain discrepancy, so target samples with significant domain shifts cannot be detected or processed for specific tasks. To solve these problems, a bi-discrepancy network is proposed for the cross-domain prediction task. Firstly, target samples with significant domain shifts are detected by maximizing the discrepancy between the outputs of the dual regressor. Secondly, the adversarial training mechanism is adopted between the feature generator and the dual regressor for global domain adaptation. Finally, the local maximum mean discrepancy is used to locally align the fine-grained features of different degradation stages. In 12 cross@-domain prediction tasks generated on the C-MAPSS dataset, the root-mean-square error (RMSE) was reduced by 77.24%, 61.72%, 38.97%, and 3.35% on average, compared with the four mainstream UDA methods, which proved the effectiveness of the proposed method.


Introduction
Aero engine prognostics and health management (PHM) is an indispensable technology to enhance production dependability, operational safety, equipment maintenance effectiveness, and cost efficiency [1][2][3].The remaining useful life (RUL) prediction is a crucial issue in the PHM sector and has great research value [4,5].The system maintenance process can be considerably optimized by an accurate RUL prediction, which can assess the system's health and assist users in developing reasonable maintenance plans [6,7].
The existing RUL prediction approaches are roughly divided into physical-modelbased and data-driven approaches.The standard physical-model-based prediction method struggles to provide an accurate deterioration model because of aero engines' complicated structure, the mathematical model's severe nonlinearity, and the close coupling between sensor data [8].In contrast, data-driven prediction methods based on a large amount of sensor data primarily rely on the corresponding intelligent algorithms to learn and characterize the degradation process of the system.These practical methods do not necessitate a deep understanding of the system's internal workings or the intricate degradation mechanism [9].Notably, numerous data-driven RUL-based prediction techniques surfaced in response to the quick development of sensor technologies.
Nevertheless, data-driven methods have some limits for the RUL prediction in industrial applications due to the following reasons: (1) the modeling process requires a large number of labeled datasets, and collecting enough labeled data is often complicated in many practical applications [10,11]; (2) it is assumed that offline training data and online test data come from the same feature space and obey the same distribution.However, in actual prediction tasks, there are invariably domain disparities between training and test data due to differences in the equipment operating circumstances and failure modes [12][13][14].
In order to improve the generalization ability of the model under different operating conditions and failure modes, domain adaptation (DA)-based transfer learning (TL) models were widely developed and applied to cross-domain RUL prediction tasks [15], which aim at extracting domain-invariant features between the source and target domains by using labeled samples from the source domain and a small number of unlabeled samples from the source domain to achieve an online RUL prediction in the target domain.Transfer learning is generally split into adversarial transfer learning and feature-based transfer learning.In order to minimize the distributional differences among features, the core of featurebased approaches entails mapping cross-domain data to a common feature space [16].For example, Sun et al. proposed a sparse self-encoder-based deep migration learning network.They investigated three migration strategies, namely, weight migration, feature migration, and weight updating, to achieve an RUL prediction for machine tools [17].Mao et al. used transfer component analysis (TCA) to sequentially adjust the features of the target bearing from those of the auxiliary bearings and then used the modified features to predict the RUL [18].The adversarial-based approach draws on generative adversarial networks and consists of generative and discriminative models.The generative model extracts features from the source and target domains, making the discriminative model unable to distinguish which domain the extracted features come from.In contrast, the discriminative model maximally distinguishes which domain the extracted features come from, and the two play with each other to realize cross-domain feature migration.For example, Costa et al. combined a domain-adversarial neural network (DANN) with LSTM to reduce the distributional differences in cross-domain RUL prediction [19].Ragab et al. proposed a comparative adversarial domain adaptive (CADA) method for the cross-domain RUL prediction of aero engines, which developed an adversarial domain adaptive architecture with contrast loss while learning domain-invariant features [20].
Despite some success in extracting domain-invariant features using the migration learning techniques mentioned above, it is essential to note that these issues still exist: (1) The sensor data of complex systems collected under different operating conditions and failure modes have significant distributional differences, and sensor data at multiple degradation stages also have different distributions.Therefore, solely adopting the global domain adaptation (GDA) technique could confuse the fine-grained features between the sub-domains denoted by various degradation stages [7], lowering RUL prediction performance.Furthermore, if the sub-domain adaptation (SDA) is directly carried out, the sub-domains might not be aligned because of the significant distributional variations between the source and target domains, which are shown in Figure 1.(2) Most of the current adversarial-based transfer learning techniques call for introducing an auxiliary model (discriminator) for the domain alignment, isolating the domain adaptation task from the prediction goal, and adding to the computational burden.(3) Under particular tasks, domain discriminators cannot discover or process target samples with significant domain shifts, as they can only classify features.
This paper proposes an aero-engine RUL prediction model based on a bi-discrepancy network (BDnet) to solve the above issues.First, target samples with significant domain shifts are identified by maximizing the regressor discrepancy without introducing additional models.Then, domain-invariant features are searched for based on the adversarial relationship between the feature generator and the dual regressor to achieve the GDA.Second, the fine-grained features of multiple degradation stages are extracted by the local maximum mean discrepancy (LMMD) to achieve the SDA.In particular, the SDA is performed on top of the GDA, which helps resolve the issue that direct sub-domain adaptation is ineffective due to the significant disparity in distribution between the source and target domains.Finally, 12 cross-domain prediction tasks from the aviation turbofan engines' C-MAPSS dataset are used for evaluation.The outcomes demonstrate that the RUL prediction This paper proposes an aero-engine RUL prediction model based on a bi-disc network (BDnet) to solve the above issues.First, target samples with significant shifts are identified by maximizing the regressor discrepancy without introduci tional models.Then, domain-invariant features are searched for based on the adv relationship between the feature generator and the dual regressor to achieve th Second, the fine-grained features of multiple degradation stages are extracted by maximum mean discrepancy (LMMD) to achieve the SDA.In particular, the SDA formed on top of the GDA, which helps resolve the issue that direct sub-domain tion is ineffective due to the significant disparity in distribution between the sou target domains.Finally, 12 cross-domain prediction tasks from the aviation turb gines' C-MAPSS dataset are used for evaluation.The outcomes demonstrate that prediction model based on the BDnet outperforms various popular feature-based learning algorithms and adversarial-based transfer learning techniques.
The main contributions of this article are summarized as follows.


First, the maximum regressor discrepancy (MRD) is proposed to detect and target samples with significant domain shifts without introducing additiona to achieve the GDA. Second, the local maximum mean discrepancy (LMMD) is designed to capt grained information from multiple degradation stages, from which the do variant features can be better learned to promote RUL prediction performan  Finally, the proposed BDnet-based RUL prediction approach is evaluated b craft turbofan engine dataset under cross-domain conditions.The research denote that the BDnet-based approach is better than some deep learning app without transfer learning and conventional transfer learning approaches.
The remainder of this paper is organized as follows.Section 2 discusses the cal background of MRD and the local maximum mean discrepancy in detail.Then The main contributions of this article are summarized as follows.

•
First, the maximum regressor discrepancy (MRD) is proposed to detect and handle target samples with significant domain shifts without introducing additional models to achieve the GDA.

•
Second, the local maximum mean discrepancy (LMMD) is designed to capture finegrained information from multiple degradation stages, from which the domaininvariant features can be better learned to promote RUL prediction performance.

•
Finally, the proposed BDnet-based RUL prediction approach is evaluated by an aircraft turbofan engine dataset under cross-domain conditions.The research results denote that the BDnet-based approach is better than some deep learning approaches without transfer learning and conventional transfer learning approaches.
The remainder of this paper is organized as follows.Section 2 discusses the theoretical background of MRD and the local maximum mean discrepancy in detail.Then, Section 3 discusses the RUL prediction method based on the BDnet in detail.Section 4 shows the results of the case study.Finally, Section 5 concludes this paper.

Maximum Regressor Discrepancy
In this paper, we draw on the concept of the maximum classifier discrepancy (MCD) [21] and propose the maximum regressor discrepancy to detect target domain samples that are significantly offset from the distribution of the source domain samples, as well as utilizing an adversarial training mechanism to gradually achieve the global domain alignment.
The distributional discrepancies between the source and target domain, in MCD's opinion, may result in either excellent or unsatisfactory classification results if the network trained in the source domain is applied straight to the target domain.The target domain samples with subpar classification outcomes more accurately denote the domain variations.Two separate classifiers, F 1 and F 2 , are introduced to detect and process these samples.If there is a significant variation in the classification outcomes of the two classifiers for the same sample, the sample is deemed to have a low confidence level and requires retraining.
The specific training process Is as follows: First, the cross-entropy loss function is used to train the feature generator, G, and classifiers, F 1 and F 2 , to correctly classify the source samples, which can be expressed as min where x s , x t denote a labeled source data and a corresponding label, y s , drawn from a set of labeled source data {X s , X t }, respectively.K denotes the number of categories, I [k=y s ] denotes the one-hot vector of labels, and p(y|x s ) denotes the K-dimensional probabilistic outputs for the input, x s .
Second, on the basis of guaranteeing the classification ability of the model, the feature generator, G, is fixed, and the difference loss function is used to train two different classifiers, F 1 and F 2 , to detect the samples with large domain shifts, which can be expressed as min where x t denotes an unlabeled target image drawn from the unlabeled target images, X t .p 1 (y|x t ), p 2 (y|x t ) denote the K-dimensional probabilistic outputs for the input, x t , obtained by F 1 and F 2 , respectively.Finally, the feature generator, G, is trained to minimize the classifier discrepancies, as shown in Equation (3).min G L adv (X t ) Unlike MCD, MRD consists of a feature extractor, F, and two regressors, R, R, with identical structures.
The feature extractor takes the labeled source domain data, , and the unlabeled target domain data, , as inputs to generate high-level feature denotations.However, the significant cross-domain difference results in a feature distribution mismatch.Figure 2 shows the data distribution of four randomly selected sensors in four subsets of the C-MAPSS dataset.Among them, the data distributions of FD001 and FD003 are more similar, and the data distributions of FD002 and FD004 are more similar, which are mainly related to the failure modes and operating conditions of the dataset.However, the data distribution of the same sensor in the four subsets is significantly different.Therefore, the network parameters learned from single-source domain data cannot directly and accurately predict the target domain data.
In order to align the source and target domain features, this paper uses the dual regressor as a discriminator to detect target domain samples with significant domain shifts.Suppose the dual regressor gives widely differing predictions for the same target domain sample.In this case, it is considered a domain-inconsistent sample with high confidence.At this point, training the feature generator can reduce this inconsistency.Unlike the classification task, since the labeled domain of the regression task is continuous and infinite, it is impossible to measure the discrepancies among the target domain samples by measuring the error between the conditional probability outputs of the two networks.
dual regressor gives widely differing predictions for the same target domain sample.In this case, it is considered a domain-inconsistent sample with high confidence.At this point, training the feature generator can reduce this inconsistency.Unlike the classification task, since the labeled domain of the regression task is continuous and infinite, it is impossible to measure the discrepancies among the target domain samples by measuring the error between the conditional probability outputs of the two networks.Considering the data imbalance problem, this paper utilizes the Tversky coefficient to quantify the difference between the dual regressors; however, the non-differentiable nature of the Tversky coefficient leads to the problem of updating the parameters of the network, so a differentiable form of the Tversky coefficient is proposed and is given by where 0.5    denotes that the same weight is given to both regressors.
tion of the characteristics of the different stages for the two regressors. denotes the Hadamard product.
To facilitate the understanding of the process of implementing the GDA using a dual regression network, Figure 3 further demonstrates the three main steps.Considering the data imbalance problem, this paper utilizes the Tversky coefficient to quantify the difference between the dual regressors; however, the non-differentiable nature of the Tversky coefficient leads to the problem of updating the parameters of the network, so a differentiable form of the Tversky coefficient is proposed and is given by where α = β = 0.5 denotes that the same weight is given to both regressors.
i , y T i denote the combination of the characteristics of the different stages for the two regressors.⊗ denotes the Hadamard product.
To facilitate the understanding of the process of implementing the GDA using a dual regression network, Figure 3 further demonstrates the three main steps.

•
Step A: In order to ensure the prediction ability of the model for the source domain samples, the feature extractor and the dual regressors are pre-trained using the source domain data to minimize the mean squared error (MSE), as shown in Equation ( 5): • Step B: In this step, we train the dual regressors as a discriminator for a fixed feature extractor.Since two regressors can easily produce inconsistent predictions for target instances with different sources, the prediction discrepancy means that two regressors involve different predictive abilities for the target samples.Thus, we utilize the prediction inconsistency to detect those target samples.Specifically, by training the dual regressors to increase the discrepancy, if the source and target regressors provide different predictions for the same target samples, we consider them as target domain samples with significant domain shifts.This step corresponds to Step B in Figure 3.
We add a regression loss to the source samples to guarantee the predictive ability of the model for the source samples.We experimentally find that our algorithm's performance significantly drops without this loss.The objective is as follows: Sensors 2023, 23, x FOR PEER REVIEW 6 of 19

L y y  Regression Loss r L
Step A: Pretraining using source domain data Step B: Maximizing dual regressor discrepancies using target domain data Minimizing the mean square error using source domain data Step C: Minimizing dual regressor discrepancies using target domain data Minimizing LMMD using source and target domain data


Step A: In order to ensure the prediction ability of the model for the source domain samples, the feature extractor and the dual regressors are pre-trained using the source domain data to minimize the mean squared error (MSE), as shown in Equation ( 5):


Step B: In this step, we train the dual regressors as a discriminator for a fixed feature extractor.Since two regressors can easily produce inconsistent predictions for target

•
Step C: We train the feature extractor to minimize the discrepancy for fixed dual regressors to move these target samples into the source support set.This step corresponds to Step C in Figure 3.This step denotes the trade-off between the extractor and the dual regressors.The objective is as follows:

Local Maximum Mean Discrepancy
Traditional DA methods usually use the maximum mean discrepancy (MMD) [22] to measure the discrepancy between the data distribution in the source domain and the data distribution in the target domain, and the MMD can be expressed as where H denotes the reproducing kernel Hilbert space (RKHS) defined by the salient kernel.k means the kernel function relation k x S , x T = φ x S , φ x T .φ defines the transformation from the initial data to the RKHS.• means the inner product operation.In this paper, based on MMD, considering the intrinsic relationship of each sub-domain in the different degradation stages of the aero engine, the local maximum mean discrepancy (LMMD) is utilized to denote the discrepancy in the data distribution between the source domain and the target domain, as shown in Equation ( 9): where p c and q c denote the distribution of subdomains belonging to category c in the source and target domains, respectively.w S c i and w T c j denote the weights of the i-th source domain sample and the j-th target domain sample belonging to category c, respectively, besides Unlike the classification problem with finite-dimensional data labels, regression problems have infinite labels.Therefore, it is not possible to directly use LMMD.In this paper, we utilize the discretization strategy of the degradation process proposed by Zhang et al. [7] to discretize the whole degradation process into multiple degradation stages, and the i-th degradation stage can be expressed as where RUL max = 125 is the threshold of the piecewise linear degradation model.In detail, the early operating stages of an aircraft turbofan engine can be considered healthy.When the RUL is less than a threshold, the turbofan engine begins to degrade [23].RUL i t is the actual remaining life of the i-th sample, and N c = 10 is the number of divided degradation stages.Therefore, for sample x i , w c i can be expressed as where l c i is the c-th element of the vector generated by l i after one-hot encoding.For the source domain samples, the actual RUL labels can be directly utilized to compute l i and, hence, w S c i .However, for the target domain samples, there are no actual RUL labels used to compute w c i .Therefore, in this paper, we utilize a dual regression network for prediction and compute y T j = ŷT j + y T j /2 as the RUL labels of the target domain samples to obtain l i , and, thus, w T c j .Thus, Equation ( 9) can be rewritten as Sensors 2023, 23, 9494 where f S/T = F X S/T is the feature extracted by the feature extractor.

RUL Prediction Method Based on BDnet
Section 2 explains the details of implementing the BDnet to seek a domain-invariant representation for cross-domain regression.In this section, the prediction process of the RUL prediction method based on the BDnet is further explained.The RUL prediction can generally be divided into the offline training phase and the online prediction phase.For , are divided by using the time window to obtain the training data.Secondly, the mapping relationship between the source domain data and the RUL is constructed by the source domain data, D S , and the GDA of the source and target domains is realized according to Equations ( 1)-( 4).Finally, the network parameters are tuned using the LMMD to realize the SDA.For the online prediction stage, the partitioned target domain data are fed into the BDnet to obtain the mean and variance of the RUL prediction.Since the sensor values at the current moment in the aero engine degradation process are closely related to the degradation states at the time steps before and after, Bi-LSTM [24] is used as a feature extractor to extract the time dependence from the sensor data in this paper.Then, to improve the accuracy of the RUL and quantify the uncertainty of the prediction results, Bayesian neural networks (BNNs) [25] with Monte Carlo dropout inference are utilized as a regressor for uncertainty prediction.The pseudo-code of the RUL prediction method based on the BDnet is shown in Algorithm 1. Update θ f , θ R, θ R using the back-propagation method to minimize L R .

7:
Input D S to Bi-Discrepancy Network for calculating regression loss L R through Equation (2) 1111wand input D T to Bi-Discrepancy Network for calculating dual regression discrepancy L tv 1111wthrough Equation (1).

8:
Update θ R, θ R using the back-propagation method to minimize L R + L tv .9: Calculate source domain data's degenerative level l S i through Equation ( 7).10: Input D T to Bi-Discrepancy Network for calculating ŷT j , y T j .

11:
Calculate target domain data's degenerative level l T j with RUL j t = ŷT j + y T j /2 through 11111wEquation (7).12: Calculate source weights and target weights w S c i , w T c j of LMMD.where λ is a hyperparameter used to control the ratio between GDA and SDA.

Case Study 4.1. Dataset and Data Preprocessing
First, this paper uses the C-MAPSS dataset provided by the NASA-Ames Research Center [26], divided into four subsets, FD001~FD004, according to different operating conditions and failure modes.Each subset consists of a training set and a test set.The training dataset contains the whole lifetime data, and the testing dataset includes the initial degradation data within a period of time, which are used for model validation.The detailed setup is shown in Table 1.Since the four sub-datasets were obtained under different failure modes and operating conditions, the data distributions significantly differed.In this sense, the research objective of this paper is to train the BDnet only using the labeled source domain data under a certain operational condition and the unlabeled target domain data under another operational condition.Furthermore, the RUL prediction is implemented for the online target domain data.The monitoring data for each flight cycle consist of 26 dimensions of feature data, where the first two dimensions denote the engine (unit) number and the cycle number.The following three dimensions are the flight conditions (flight altitude, Mach number, and throttle-stick solver angle), and the remaining 21 dimensions are the monitoring data.In addition, to prevent the redundancy and interaction of the multidimensional sensor data from adversely affecting the RUL prediction, this paper analyzes the degree of correlation among the sensors and the trend of degradation of each sensor over time to select the optimal sensor signal for the RUL prediction.The correlation and monotonicity measures can be expressed as follows [27].
where c and m represent correlation and monotonicity, respectively.F t represents the feature value at cycle t.δ(•) represents the unit step function.n indicates the quantity of the signals.The range of both c and m is [0, 1].c = 1 represents the complete correlation, and m = 1 represents the monotonic increasing or monotonic decreasing.Thus, sensors with a larger (c + m)/2 can better represent the changing trend and are selected.Take FD001 as an example: #2, #3, #4, #7, #8, #11, #12, #13, #15, #17, #20, and #21 are selected when the threshold is set to 0.75.
Moreover, the same sensor may have different measurements under different operating conditions.In order to reduce the effect of the operating conditions, the sensor data under each operating condition are min-max normalized so that the data size is limited to the range of [0, 1].Finally, a time window with a window size of 30 and a step size of 1 is utilized to divide the raw data to generate the training and testing datasets.

Simulation Conditions and Parameter Settings
The hardware and software environment of the simulation is an NVIDIA GeForce RTX 3060 Laptop GPU, an Intel Core i7-11800 H CPU, 32 G RAM, Windows 11, and Python 3.7, based on the PyTorch framework.Manufacturer is Lenovo (Beijing, China).
The BDnet consists of three main parts: a feature extractor, a dual regressor, and a subdomain adaptor.The main parameters are defined as follows: the time window length is set to 30, the number of loops is 200, the early stop mechanism is set, the degeneracy inflection point is 125, the hidden layer dimension of the dual regressor is 32, and the optimizer adopts the Adam optimizer.In addition, the performance of the prediction model may be affected by the different batch sizes in the neural network, the learning rate of the feature extractor, the learning rate of the dual regressor, the dimension of the hidden layer of the Bi-LSTM, the number of layers of the Bi-LSTM, and the scaling factor.Therefore, it is essential to investigate the adaptability and robustness of the model for different tasks by adjusting the hyperparameters.The hyperparameter configuration is shown in Table 2. Take FD001-FD002 as an example to discuss the influence of the hyperparameters on prediction performance, and the results are shown in Figure 4.As shown in Figure 4, the cross-domain prediction performance is best when the batch size is 256, the feature extractor learning rate is 0.005, the dual regressor learning rate is 0.01, the number of Bi-LSTM layers is five, the dimension of the Bi-LSTM hidden layer is 32, and λ is 0.3.The hyperparameter settings for all the cross-domain combinatorial research tasks are shown in Table 3.

023, 23, x FOR PEER REVIEW
Scaling factor (  ) 0.1, 0.3, 0.5, 0.7, 0 Take FD001-FD002 as an example to discuss the influence of the hyperparam prediction performance, and the results are shown in Figure 4.As shown in Figu cross-domain prediction performance is best when the batch size is 256, the fea tractor learning rate is 0.005, the dual regressor learning rate is 0.01, the numbe LSTM layers is five, the dimension of the Bi-LSTM hidden layer is 32, and  is hyperparameter settings for all the cross-domain combinatorial research tasks are in Table 3.

Evaluation Indicators
In order to measure the predictive performance of the proposed model, the scoring function (SF) and root-mean-square error (RMSE) are used for quantitative assessment.
The SF is an asymmetric function often used to evaluate the performance of the remaining life prediction.For the remaining life prediction of an aero engine, the ahead prediction is better than the lagged prediction, so the lagged prediction is more severely penalized than the ahead prediction for the same error.The SF is defined as Compared to the SF, the RMSE equally penalizes the over-advanced and lagged prediction errors.The RMSE is defined as where ŷi and y i are the predicted and actual values of the i-th sample of the testing set, respectively.N is the number of samples.

Experimental Results Analysis and Discussions
In order to validate the performance of the BDnet on the cross-domain prediction tasks, ablation experiments are first performed.The model that only utilizes the feature extractor and dual regressor directly trained with the source domain data on the target domain data is defined as model 1, which can be considered as the benchmark model for the RUL prediction problem.The model that uses the MCD is defined as model 2, and the model that uses both the MCD and the LMMD is defined as model 3.The three models' predictions, as mentioned above, are compared with the real RUL.The models are trained using the source domain data and evaluated on the target domain.Since the C-MAPSS dataset has four subsets, 12 cross-domain prediction tasks are generated, and the results are shown in Figure 5. First, model 3 gives the highest accuracy on all 12 cross-domain prediction tasks.Second, model 2 and model 3 are more stable and less fluctuating than model 1.
Taking FD001 as the source domain data, the cross-domain prediction results for the target domains of FD002, FD003, and FD004 are shown in Figure 5a-c, respectively.Since FD001 has a significant distributional discrepancy compared to FD002 and FD004, an apparent two-stage optimization structure can be seen, i.e., the distribution of the source and target domains is firstly brought closer by the GDA to learn the degradation trend of the target domain (red dashed line).Then, the prediction results are fine-tuned by the SDA to better fit with the real RUL (blue solid line).Although the data distributions of FD001 and FD003 are more similar, model 2 in Figure 5b gives the most significant prediction error compared to model 2 in Figure 5a,c.This is mainly because the maximum regressor difference is optimized by finding the difference features between the source and target domains.If the difference is slight, then multiple pieces of training may lead to overfitting.Taking FD001 as the source domain data, the cross-domain prediction results for the target domains of FD002, FD003, and FD004 are shown in Figure 5a-c, respectively.Since FD001 has a significant distributional discrepancy compared to FD002 and FD004, an apparent two-stage optimization structure can be seen, i.e., the distribution of the source and target domains is firstly brought closer by the GDA to learn the degradation trend of the target domain (red dashed line).Then, the prediction results are fine-tuned by the SDA to better fit with the real RUL (blue solid line).Although the data distributions of FD001 and FD003 are more similar, model 2 in Figure 5b gives the most significant prediction error compared to model 2 in Figure 5a,c.This is mainly because the maximum regressor difference is optimized by finding the difference features between the source and target domains.If the difference is slight, then multiple pieces of training may lead to overfitting.Therefore, we increase the degree of subdomain adaptivity by increasing the value of λ to achieve higher prediction accuracy.
Figure 5d-f demonstrate the cross-domain prediction results with the source domain as FD002 and the target domains as FD001, FD003, and FD004.Similar to the results with the source domain as FD001, the prediction accuracy is higher when the target domains are FD001 and FD003, and FD001 does not need subdomain adaptation because FD001 has only one failure mode compared to FD003.
Figure 5g-i demonstrate the cross-domain prediction results with FD003 as the source domain, and the prediction results of model 2 for all three tasks are significantly degraded, which is mainly due to the multi-fault modes.In addition, the degradation trend of the target domains, FD002 and FD004, is consistent with the real RUL.In contrast, the degradation trend of the target domain, FD001, shows a significant deviation from the real RUL.Thus, the λ of this group of experiments is increased in comparison with that of the previous two groups of experiments.
Like the last set of experiments, the source domain, FD004, also has two fault modes.However, the prediction results are better because FD004 has more training data and adequate feature extraction.
As shown in Table 4, model 1 has the worst prediction performance, and its prediction results heavily rely on the similarity between the source and target domain data distributions.For model 2, except for the four results of the RMSE and the two results of the SF (underlined) that are higher than their counterparts in model 1, the other results are better than their counterparts in model 1, and the result of the RMSE for FD002-FD001 in model 2 is the optimal result for the three models (bold).For model 3, all results are optimal for the three models, except for a result of the RMSE that is slightly higher than the corresponding result in model 2. Compared to model 1, the prediction model without domain adaptation, model 3 has an average reduction of 50.62% and 76.99% for the results of the RMSE and SF.Compared to model 2, with only global domain adaptation, model 3 has an average reduction of 35.16% and 47.66% for the results of the RMSE and SF.Therefore, in most cases, model 3 predicts better than model 1 and model 2, proving the effectiveness of the BDnet-based prediction model.Figure 6 depicts the distribution of the feature vectors extracted by the three models to further validate the migration learning effect based on the BDnet prediction model (t-SNE is used in this work).For model 1, the representations of the features extracted from the source and target domains are more dispersed in two-dimensional space, and the degradation trend is not apparent enough, which further illustrates the discrepancy in the data distribution under different operating conditions and fault modes.With global domain adaptation, the degradation trend shown in Figure 6b is pronounced, and there is only a slight deviation in the distribution of features extracted from the two domains.Fine-tuned by subdomain adaptation, the health status of the two domains in model 3 are clustered in the same region, and the degradation processes are well-aligned.Thus, the visualization results intuitively validate the effectiveness of the BDnet-based prediction model.
data distribution under different operating conditions and fault modes.With global domain adaptation, the degradation trend shown in Figure 6b is pronounced, and there is only a slight deviation in the distribution of features extracted from the two domains.Fine-tuned by subdomain adaptation, the health status of the two domains in model 3 are clustered in the same region, and the degradation processes are well-aligned.Thus, the visualization results intuitively validate the effectiveness of the BDnet-based prediction model.To assess the quality of the model in transferring the degradation patterns from a source to a target domain, the BDnet-based predictive model is compared with four mainstream UDA methods targeting the C-MAPSS dataset.TCA-DNN and CORAL-DNN are UDA methods that combine traditional machine learning algorithms with deep neural architectures.TCA [28] uses MMD to learn cross-domain migration components in the regenerative kernel RKHS to construct a feature space that minimizes the domain differences.CORAL [29] aligns the second-order statistical features of the distributions of the source and the target domains using linear transformations.LSTM-DNN [19] and CADA [20] are typical adversarial UDA methods.LSTM-DNN consists of a feature extractor, a RUL predictor, and a domain discriminator, whereas CADA adds a contrast loss estimation module to minimize adversarial loss.
Twelve engines were randomly selected for comparison experiments on 12 cross-domain prediction tasks, and Figure 7 demonstrates the combined comparison results of the 12 engines for the five methods.The proposed method achieves good prediction performance compared to the four mainstream methods.TCA-DNN and CORAL-DNN are basically unable to learn the trend information of the degradation process.Although LSTM-DNN can learn the trend information of the degradation process, the prediction results significantly differ from the actual value due to the lack of local subdomain adaptation.In contrast, CADA obtained better prediction results by the contrastive loss module for finetuning, but, compared with the BDnet, CADA's prediction results have greater volatility.To assess the quality of the model in transferring the degradation patterns from a source to a target domain, the BDnet-based predictive model is compared with four mainstream UDA methods targeting the C-MAPSS dataset.TCA-DNN and CORAL-DNN are UDA methods that combine traditional machine learning algorithms with deep neural architectures.TCA [28] uses MMD to learn cross-domain migration components in the regenerative kernel RKHS to construct a feature space that minimizes the domain differences.CORAL [29] aligns the second-order statistical features of the distributions of the source and the target domains using linear transformations.LSTM-DNN [19] and CADA [20] are typical adversarial UDA methods.LSTM-DNN consists of a feature extractor, a RUL predictor, and a domain discriminator, whereas CADA adds a contrast loss estimation module to minimize adversarial loss.
Twelve engines were randomly selected for comparison experiments on 12 crossdomain prediction tasks, and Figure 7 demonstrates the combined comparison results of the 12 engines for the five methods.The proposed method achieves good prediction performance compared to the four mainstream methods.TCA-DNN and CORAL-DNN are basically unable to learn the trend information of the degradation process.Although LSTM-DNN can learn the trend information of the degradation process, the prediction results significantly differ from the actual value due to the lack of local subdomain adaptation.In contrast, CADA obtained better prediction results by the contrastive loss module for finetuning, but, compared with the BDnet, CADA's prediction results have greater volatility.
The comparative experimental results of the RMSE are shown in Table 5.The proposed method is better than TCA-DNN, CORAL-DNN, and LSTM-DNN on all 12 cross-domain prediction tasks, with only four tasks being slightly lower than CADA, with an average reduction of 77.24%, 61.72%, 38.97%, and 3.35% in the RMSE values compared to the four comparison methods.
The comparative experimental results of the SF are shown in Table 6.The proposed method is better than TCA-DNN [19], CORAL-DNN [19], and LSTM-DNN [19] on all 12 cross-domain prediction tasks, with six tasks being slightly lower than CADA [20], with an average reduction of 42.12% in the RMSE values compared to CADA.
In addition, the BDnet significantly improves the prediction performance of the larger domain offset prediction task while maintaining the performance of the smaller domain offset prediction task, proving the superiority of the BDnet without introducing other auxiliary models.

Conclusions
In this paper, an unsupervised domain adaptive regression method based on the BDnet was proposed to address the realistic problem of unlabeled samples in the aero engine RUL prediction task.Unlike previous adversarial DA methods that introduce discriminators for domain feature alignment, the BDnet achieves the GDA by maximizing the regressor difference to gradually detect and adjust the target domain samples with significant domain shifts, without introducing an additional model.In addition, to extract the fine-grained features of the aero engine at different degradation stages, the LMMD is utilized to achieve the SDA based on the GDA.Finally, experiments are conducted on 12 cross-domain prediction tasks to verify the effectiveness of the BDnet-based RUL prediction model in solving the unsupervised domain adaptive regression problem.

Figure 1 .
Figure 1.The schematic of global alignment and sub-domain alignment.

Figure 1 .
Figure 1.The schematic of global alignment and sub-domain alignment.

Figure 2 .
Figure 2. The difference in distribution for each C-MAPPS dataset.

Figure 2 .
Figure 2. The difference in distribution for each C-MAPPS dataset.
the offline training phase, firstly, the source domain data, D S =

Algorithm 1 1 : 2 :
RUL prediction based on bi-discrepancy network Input: Train data D S = domain, Maximum number of training epoch max .Selection variable and swipe the time window to generate samples.Initialize randomly the model parameters θ f , θ R, θ R .3:Initialize epoch = 1.4:while epoch ≤ epoch max do 5:Input D S to Bi-Discrepancy Network for calculating regression loss L R through Equation (2).6:

Figure 5 .
Figure 5. RUL predictions of three models for 12 randomly selecting engines coming from the target domain.

Figure 5 .
Figure 5. RUL predictions of three models for 12 randomly selecting engines coming from the target domain.First, model 3 gives the highest accuracy on all 12 cross-domain prediction tasks.Second, model 2 and model 3 are more stable and less fluctuating than model 1.Taking FD001 as the source domain data, the cross-domain prediction results for the target domains of FD002, FD003, and FD004 are shown in Figure5a-c, respectively.Since FD001 has a significant distributional discrepancy compared to FD002 and FD004, an apparent two-stage optimization structure can be seen, i.e., the distribution of the source and target domains is firstly brought closer by the GDA to learn the degradation trend of the target domain (red dashed line).Then, the prediction results are fine-tuned by the SDA to better fit with the real RUL (blue solid line).Although the data distributions of FD001 and FD003 are more similar, model 2 in Figure5bgives the most significant prediction error compared to model 2 in Figure5a,c.This is mainly because the maximum regressor difference is optimized by finding the difference features between the source and target domains.If the difference is slight, then multiple pieces of training may lead to overfitting.

Figure 6 .
Figure 6.Visualization results of the learned features by three methods of No. 4 engine from source domain and No. 5 engine from target domain for FD001-FD002 cross-domain prediction tasks.

Figure 6 .
Figure 6.Visualization results of the learned features by three methods of No. 4 engine from source domain and No. 5 engine from target domain for FD001-FD002 cross-domain prediction tasks.

Table 1 .
The description of aircraft turbofan engine dataset.

Table 3 .
Hyperparameters configuration under various cross-domain prediction tasks.

Table 3 .
Hyperparameters configuration under various cross-domain prediction tasks.

Table 4 .
Comparison of RMSE and SF for three models under various cross-domain prediction tasks.

Table 6 .
Comparison of SF for the proposed method against state-of-the-art approaches.